STD-WORDS 


r MATURES 

STD Bus Compatible 

' VOTRAX' SC-01 Speech Synthesizer Device 
Phoneme Driven — Unlimited Vocabulary 
Easy To Use 

' Text-To-Speech Capabilities 
' Parallel Bus I/O for Fast Data Transfers. 

' Occupies Single I/O Port Address 
-'^On-board Audio Amplifier (800 miliWatt) 

' Direct Connect to 8 Ohm Speaker 

1 Vear Warranty Figure 1. STD-WORDS 

DESCRIPTION 

The COLEX STD-WORDS board is a cost effective means of adding,e lectroni.c 
speech to any STD Bus microcomputer system. The STD-WORDS is based on the 
Votrax“ SC-01 Phoneme Speech Synthesizer, is easy to use, and is capable of 
synthesizing continuous speech of unlimited vocabulary from extremely low 
data rate 6-bit codes. 


BLOCK DIAGRAM 




Figure 3. STD-WORDS Control Header and Connector Locations 
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THEORY OF OPERATION 


There have been a number of schemes devised to allow human sounding speech 
to be generated by computer systems. These schemes range from analog 
formant filtering of waveforms and noise to wholesale digitization of the 
analog voice signal and then later digital to analog regeneration of the 
original signal. 

The quality of the resultant speech ranged from passable with the former 
system (using moderate data rates) to outstanding for the latter system 
(using extremely high data rates). But in neither case is the ratio of 
speech quality to memory requirements satisfactory. 


Signal Digitization 

Speech digitization is a brute force type of procedure that produces the 
highest quality speech but uses up the most amount of memory. The analog 
signal is sampled at a rate that is roughly twice that of highest frequency 
of interest expected in the analog sample. At each sampling point, the 
instantaneous voltage of the signal is digitized and stored in consecutive 
memory locations as a byte of data. Each word in the vocabulary to be 
spoken by the machine is input, digitized, and assigned a block of memory. 

To regenerate the original signal, each word to be spoken is selected and 
the corresponding block of memory addressed. Then consecutive memory 
locations within that block are read at the same rate as the signal was 
originally sampled and the contents applied to a digital to analog converter 
and low-pass filter. This procedure is repeated for each word to be spoken. 
The resulting output is a faithful recreation of the original signal. 

Human speech (especially female speech) contains significant harmonic energy 
up to and somewhat beyond 8 kHz. Therefore, to provide good quality 
reproduction and to prevent excessive aliasing (beat notes between the 
sampling frequency and the higher frequencies of the signal to be sampled 
that fall within the pass band of the output low-pass filter and therefore 
cannot be easily removed from the output signal), the sampling rate should 
be around 16 kHz. 

The memory requirements for the storage of the digitized speech works out to 
be about 16 kbytes per second of speech. To reconstruct the speech, a data 
rate of 16 kHz. is required. Various methods have been developed to reduce 
the memory requirements and data rates required in this sort of system. 
These have included reducing the bandwidth of the input signal and/or the 
sampling rate, reducing the conversion resolution, and special data 
compaction routines but have not produced satisfactory results for low cost 
sy stems. 
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Formant Synthesis 

In formant synthesis systems, electronic filter networks are designed to 
model the human, vocal tract. These filters have variable characteristics 
and are implemented by either analog, or more recently, digital methods and 
closely model the qualities of the tongue, teeth, resonant cavities, and 
other components of the vocal tract. 

Excitation signals consisting of periodic and random waveforms resembling 
vocal cord pitch and the sounds of air turbulence are applied to the vocal 
tract model filter network. The filter network removes unwanted portions of 
the excitation signal spectra as the components of the human vocal tract 
remove portions of the signal from the vocal cords. An algorithm in the 
computer generates complex control signals for the filter network as well as 
the filter excitation signals in real time for each word spoken. The 
characteristics of both the excitation signals as well as the filter network 
are varied together in such a way as to produce artificial speech. 

In ordinary formant synthesis systems, the excitation signal characteristics 
as veil as control parameters (parametric data) for the filter network are 
analyzed for each word in the vocabulary and stored in memory. When the 
word is to be spoken, these data are read from memory and applied to the 
signal generators and filter network. And thus the word is spoken by the 
system. This realizes a significant savings in both data rates and memory 
requirements over the voice digitization systems since only filter and 
excitation signal parameters must be stored for each word in the vocabulary. 


Phoneme Synthesis 

The next level of data rate and storage reduction is achieved when formant 
synthesis data for entire words are no longer stored. This requires that 
the words of speech be broken down into a number of basic sounds that can be 
strung together in various combinations to form the words of English speech. 
These basic sounds are called phonemes. 

It has been found that English speech can be broken down into 62 phonemes 
(along with two different no-sound intervals) that can be strung together in 
various combinations to adequately reconstruct most English words. Now, 
instead of storing the formant synthesis data for each word in the 
vocabulary to be spoken, only 64 phonemes need be stored for a vocabulary of 
any size. A phoneme list for each word to be spoken is either stored in 
memory or obtained from an outside source at the time of speech. 

To create speech, the computer reads the formant synthesis data from memory 
for each phoneme required to simulate the word to be 6poken. This data is 
applied to the excitation signal generators and the filter network to create 
a string of phonemes approximating the desired word. 
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The SC-01 Phoneme Synthesizer Chip 

The next reduction in data rate and memory requirements for speech synthesis 
systems (and the one used by the COLEX STD-WORDS board) removes the complex 
filter network parameter and excitation signal characteristic control duties 
from the host processor and gives them to a hardware device designed 
specifically for the job. This eliminates about 99% of the processor 
overhead required for synthetic speech* All that is now required from the 
host is a list of the phonemes required for the word to be spoken* The 
hardware does the rest* 

In the case of the COLEX STD-WORDS board, the phoneme synthesis hardware is 
in the form of a VOTRAX SC-01 LSI CMOS phoneme synthesizer chip and a few 
ether items such as data latches, logic blocks, audio amplifiers, and the 
like* The SC-01 can produce 62 different phonemes and two no sound 
intervals, each of which are accessed by a 6-bit binary code* 

Synthetic speech using the SC-01 requires an average data rate of 
approximately 70 bits (less than 9 bytes) per second* The normalized 
duration of each phoneme ranges from about 47 miliseconds to 250 
mi 1 iseconds* The pitch of the phoneme is increased and duration of the 
phoneme is decreased as the clock rate is increased (standard timing and 
pitch result from a 720 kHz. clock rate). 

The SC-01 device can be divided into two functional parts (figure 4). The 
first part is called the phoneme controller and translates the 6-bit digital 
word applied to its input lines into the required phoneme. Internal 
algorithms and look-up tables generate the complex array of temporal vs 
spectral parameters that control the second part of the device (the filter 
network model and excitation signal generators) for the selected phoneme. 


Figure 4. The SC-01 Phoneme Synthesizer Chip 
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The second part of the device contains a pair of signal sources. The first 
is a variable frequency oscillator for simulating the action of the vocal 
cords. The second is a pink noise generator that simulates the sound of 
rushing air. These signal sources are controlled by the phoneme controller 
to match the requirements of the desired phoneme. 

The output of these two sources are summed together and become the 
excitation signals that are applied to the filter network. This network is 
composed of four bandpass filters that simulate the voicing components of 
the human vocal tract (tongue, sinus cavities, teeth, etc.). The transfer 
function for each of the four filters is instantaneously adjusted by the 
phoneme controller (parametric data) in real time to produce the require 

phoneme. 

The combination of controlling the excitation signal sources (inflection) 
and parametric data driving the filter network (voicing) produce life-like 
phonemes that can be combined to provide good quality speech. 

) 

The SC-01 Phoneme Synthesizer Chip Operation 

A six bit data word corresponding to the desired phoneme is applied to the 
six data lines of the SC-01. The data word is latched into the register and 
applied to the phoneme controller by bringing the STROBE line down. This 
also RESETS the ACKNOWLEDGE/REQUEST flip-flop causing that line to go low. 

Once data is applied to the phoneme controller, it determines which phoneme 
is desired, generates a spectral matrix for that phoneme, and operates the 
excitation signals and filter network accordingly. The output sound is 
amplified and appears at the OUTPUT pin of the chip. 

Each phoneme is internally timed and has a duration of from 47 to 250 
milliseconds. When generation of the phoneme is complete, the 
REQUEST/ACKNOWLEDGE flip-flop is SET causing the REQUEST/ACKNOWLEDGE line to 
go high signifying that the synthesizer is ready for the next phoneme data 
word. A dynamic articulation controller provides a smooth transition from 
one phoneme to the next producing a higher quality speech. 

An on-chip clock oscillator provides a clock signal to the various parts of 
the SC-01 device. The clock rate is adjusted by changing the external R C 
time constant. A nominal clock rate of 720 kHz. provides the standard 
phoneme pitch and durations. Increasing the clock rate raises the pitch of 
the output sound and reduces the duration of the phonemes. Accurate 
phonemes are still generated due to the fact that all parts of the 
synthesizer track each other closely. 

This is made possible by the fact that the filter network is of the 
switched-capacitor type. The bandpass of such filters is dependent upon the 
clock rate applied to them. Therefore, as the clock rate increases, the 
filter excitation signal frequency increases, and the bandpass of the filter 
network increases as well. 

Two pitch control lines allow gross adjustment of the excitation signal 
frequency without disturbing any parameters of the filter network. This 
allows more than one voice to be emulated. 
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STD-WORDS OPERATION 


Latch U5 receives the 8-bit data word from the bus, latches it, and applies 
the lower six bits (phoneme selection word) directly to the control word 
inputs of the SC-01 phoneme synthesizer chip in response to the WRITE ENA* 
signal generated elsewhere on the board. The SC-01 is a CMOS device and is 
operated off the +12 vdc power supply line. The six data byte input lines, 
however, are TTL level capable and only and require the output of latch U5 
to be pulled up by resistor network UR1. 

The MSB and next to MSB data bits (II and 12) are inflection bits. They are 
inverted and level-shifted to 12 volts by transistors Ql and Q2 before being 
applied to the SC-01 device* 

During the time that the SC-01 is "talking", before the phoneme in progress 
times out, it outputs a high on the A/R line* This signal is inverted and 
driven out on the bus by four elements of U8. This low on the MSB indicates 
to the bus that the SC-01 is busy. The processor tests the MSB of the port 
where the data word was sent out by activating IORQ* and RD*. When the MSB 
goes high, the STD-WORDS card is ready for another data byte* 

There are no timing problems with the output data byte; watch MSB; write 
another data byte procedure because the The SC-01 continues to produce the 
last phoneme until told to be silent, stop, or to speak the next one. 

A pair of 4-bit magnitude comparators check for coincidences between the 
port address select header and the address on the bus. If a coincidence is 
found, the on-board ADDRESS DECODE signal goes true. This signal is applied 
to the read/write decode logic (U4.and three elements of U8), Bus RD*, 
IORQ* , and WR* signals-are inverted and applied to decoder gates U4. 

A coincidence of IORQ*, RD*, and a valid port address (as detected by U6 and 
U7) creates an on-board READ ENA* signal which drives the MSB of the port 
with the A/R signal from the SC-01 chip. This is used when the bus 
processor tests the MSB of the port to see if the SC-01 is finished with the 
last phoneme and is ready for the next one. 

To write a data byte to the SC-01, the bus processor addresses the STD-WORDS 
card and activates the IORQ* and WR* lines. These signals are decoded and a 
WRITE ENA signal applied to the data latch (U5) and one-shot U3. The first 
section of U3 inserts a delay in the SC-01 STB strobe line allowing the data 
byte to be latched (U5), applied to the data lines of the SC-01, and the SC- 
01 data inputs to settle before the chip is strobed. 

The second half of U3 generates the appropriate pulse width for the SC-01 
strobe signal. The I/O write pulse coming from the processor card is very 
narrow — on the order of a microsecond or so, and is much too brief for the 
CMOS SC-01. 

The clock signal for the SC-01 is generated on the chip and the frequency is 
controlled by C7 and the resistor network 1 attached to pins 15 and 16 of the 
SC-01. The clock rate is user adjustable through a small range making the 
overall pitch higher or lower by VR1 which is the trim pot closest to the 
card ejector. 

Speech output is obtained from pins 20, 21, and 22. This signal is applied 
to the input of audio power amplifier U1 through volume control trim pot 
VR2. The amplified speech is then available at J-2, a miniature phone jack. 
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OPTION SELECTION 


Jumper selectable options, port addresses, and so on are selected by 
installing a shorting strap across a pair of control header pins. The STD- 
VORDS card uses a special three-row control header. The center row is 
connected to the control circuitry and the outside rows are connected to the 

power supply rails. 


A selection is made by strapping the center pin to either of the outside 
pins as required by the desired port address. The pins are situated 
directly across the header from each other allowing the option to be 
selected by wire-wrapping or inserting a Berg type strap. 


PORT ADDRESSING 


ADDRESS SELECTION HEADER J-3 


A7 

1 


'(2) 

X 

3 

A6 

4 

x 

'(5) 

X 

6 

A5 

7 

% 

'(8) 

X 

9 

A4 

10 

N 

'(11) 

V 

12 

A3 

13 

■v 

'(14) 

V 

13 

A2 

16 

X 

'(17) 

■s 

18 

A1 

19 

N 

'(20) 

> 

21 

A0 

22 

> 

Vcc 

'(23) 

* 

Vss 

24 


ADDRESS STRAP SELECTION CHART 


X 


A3 A2 A1 AO — LSB 

A7 A6 A5 A4 — MSB 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 
A 
B 
C 
D 
E 
F 


0 0 0 0 
0 0 0 1 
0 0 10 
0 0 11 
0 10 0 
0 10 1 
0 110 
0 111 
10 0 0 
10 0 1 
10 10 
10 11 
110 0 
110 1 
1110 
1111 


NOTES 


0 indicates a strap connected 
from the center pin to the Vss 
(ground) pin on header J3. 

{ 

1 indicates a strap connected 
from the center pin to the Vcc 
'(+5 VDC) pin on header J3. 


Figure 5. STD-WORDS Port Address Charts 
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IOEXP 


IOEXP (I/O EXPand) is used in some STD Bus systems to double the number of 
I/O ports available. There are 256 strappable I/O ports in a basic system. 
When IOEXP is pulled high, another 256 ports sharing the same logical 
addresses become available. This line is usually tied to either + 5 vdc or 
to ground, but may be toggled either way during system operation according 
to the requirements of the boards on the bus. 

The IOEXP line is, essentially, an additional address line and I/O boards 
are most likely to use this it. When a port is addressed, an I/O function 
on a board that is strapped for that port address is selected. If IOEXP is 
used, then two I/O functions may be strapped for the same address with the 
IOEXP line determining which is to be activated. 

In most cases, IOEXP is ignored. To configure the STD-WORDS card for 
operation excluding IOEXP, 6trap pins one and two of J-4 together. The 
board is shipped with this option strapped. 

To configure the STD-WORDS board for use in systems where IOEXP is used, 
strap pins one and three on J-4 together. The board then becomes active 
only when its port is addressed AND IOEXP is high. 


1 - " 2 
3 


Figure 6. IOEXP HEADER 
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USING THE STD-WORDS 


Using the STD-WORDS card is very simple and quite straight forward. The 
procedure is to write a data byte to the STD-WORDS port and toggle the IORQ* 
and WR* bus lines. The card latches the data byte and speaks the phoneme 
selected. When the phoneme is complete (but the phoneme is still 
sounding), the MSB of the port goes high. The next data byte is then 
applied to the card, and so on, and so on. 

While the data byte is eight bits wide, only six are required for phoneme 
selection. The other two (MSB and next to MSB) are inflection bits 
specifying one of four subtle pitch variations. This helps to alleviate the 
"robot monotone" quality of most phoneme synthesized speech and make the 
output more life-like. The inflection bits work on the phoneme specified by 
the other 6ix bits and do things like produce the upward movement in pitch 
in the last few sounds of a question. 

The SC-01 chip, once started, produces sound (the phoneme selected) until 
instructed to be quiet for a specific time with a PAUSE phoneme or to shut- 
up entirely with a STOP command, or to produce the next sound phoneme. 

The following flowchart, Z-80 machine language code, and data DATA describe 
how a processor can send a string of phonemes to the STD-WORDS card to cause 
it to speak the word "HELLO". Similar sequences can be used to send a 
single sound, a group of sounds (word), or a group of words to the card and 
can be implemented with high-level languages and look-up tables for accurate 
text-to-speech operation. 

Z80 CODE 


LD HL.data 


READ: IN A,(port) 


BIT 7,A 
JP NZ,READ 

LD A,(HL) 

OUT(port),A 


INC HL 


LD A, 3FH 
CP (HL) 

JP NZ.READ 


NOTE In this program, it i6 important to silence the chip with a PAUSE 
before issuing a STOP command. 


START DATA 


Set A Pointer To 

PHONEME 

HEX 

1st Sound 

H, 

IB 


EH1 

02 


UH3 ' 

23 

Get STD-WORDS Data 

L 

18 

From Port 

UH3 

23 


01 

35 


U1 

37 

IS BIT 

PAl 

3E 

7 SET? 

STOP 

3F 


Output Sound To 
STD-WORDS Card 


Point TO Next 
Sound 


Is 

It A 
STOP? 


STOP 
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PHONEME SOUND CHANT 


Phoneme 

Phoneme 

Example 

Phoneme 

Phoneme 

Example 

Code 

Symbol 

Word 1 

Code 

Symbol 

Word 

00 

EH 3 

jacket 

20 

A 


01 

EH 2 

en1is t 

21 

AY 

day. 

02 

EHl 

heavy 

22 

Yl 

yard 

03 

PAO 

no sound 

23 

UH3 

mission 

04 

DT 

butter 

24 

AH 

mop 

05 

A2 

made 

2.5 

P 

pas t 

06 

A1 

made 

26 

0 

co^ld 

07 

ZH 

azure 

27 

I 

pin 

08 

AH2 

hones t 

28 

U 

move 

09 

13 

inhibit 

29 

Y 

aoy 

OA 

12 

jjihibit 

2A 

T 

tap 

OB 

11 

inhabit 

2B 

R 

.red 

OC 

M 

mat 

2C 

E 

meet 

0D 

N 

SU H 

2D 

W 

win 

OE 

B 

bag 

2E 

AE 

d_ad 

OF 

V 

van 

2F 

AE1 

■after 

10 

CH* 

chip 

30 

AW2 

sanity 

11 

SH 

shop 

31 

UH2 

about 

12 

Z 

zoo 

32 

UHl 

uncle 

13 

AW1 

lawful 

33 

Ull 

cup 

14 

NG 

thing 

34 

02 

for 

15 

AH1 

father 

35 

01 

aboard 

16 

001 

looking 

36 

IU 

you 

17 

00 

book 

37 

U1 

you 

18 

L 

_land 

38 

THV 

the 

19 

K 

trick 

39 

TH 

thin 

1A 

J* 

judge 

3A 

ER 

bjjrd 

IB 

H 

he llo 

3B 

EH 

get 

1C 

G 

get 

3C 

El 

be 

ID 

F 

fast 

3D 

AW 

call 

IE 

D 

paid. 

3E 

PA1 

no sound 

IF 

S 

pass 

3F 

STOP 

no sound 

* T must 

precede CH 

to produce CH sound 

y 



D must 

precede J 

to produce J sound 
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STD-Z80 BOS CONNECTOR 


Bus Connector: 56 pin dual edge connector, 0.125 inch contact centers 


SIGNAL NAME 


PIN NUNBERS 


SIGNAL NAME 


5 VDC 

2 

1 

+ 5 VDC 

GROUND 

4 

3 

GROUND 

N/C 

6 

5 

N/C 

D7 

8 

7 

D3 

D6 

10 

9 

D2 

D5 

12 

11 

Dl 

D4 

14 . 

13 

DO 

N/C 

16 

15 

A7 

N/C 

18 

17 

A6 

N/C 

20 

19 

A5 

N/C 

22 

21 

A4 

N/C 

24 

23 

A3 

H/C 

26 

25 

A2 

N/C 

28 

27 

A1 

N/C 

30 

29 

A0 

RD* 

32 

31 

WR* 

N/C 

34 

33 

IORQ* 

N/C 

36 

35 

IOEXP 

N/C 

38 

37 

N/C 

N/C 

40 

39 

N/C 

N/C 

42 

41 

N/C 

N/C 

44 

43 

N/C 

N/C 

46 

45 

N/C 

N/C 

48 

47 

N/C 

N/C 

50 

49 

N/C 

PCI 

52 

51 

PCO 

N/C 

54 

53 

N/C 

N/C 

56 

55 

+ 12 Volti 


SPECIFICATIONS 


ELECTRICAL 


System Bus: 
Audio Output: 
I/O Address: 


STD Bus 

0.8 Watt into 8 Ohm load 
Single I/O Port, Jumper Selectable 
System Interrupt Units: 0 SIU 

Bus Signal loading: Inputs: one 74LS load maximum 

Outputs: -3 mA min @2.4 volts 
24 mA min @ 0.5 volt6 
Operating Temperature: O' to 60' C. 


Power Requirements @ 25' C: 


PARAMETER 

CONDITION 

MIN 

TYP 

MAX 

UNITS 

V cc 


4.75 

5.00 

5.25 

* volts 

*cc 

5vdc 


100 

125 

mA 

V cc 

— 

11.50 

12.00 

12,50 

volts 

I cc 

12vdc 

— 

100 

75 

mA 
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MECHANICAL 


t 

Cord Dimensions: 

FORM FACTOR _H_W_ L PNITS 

STD Bus 0.60 4.50 6.50 inches 

PC Board Thickness: 0.062 inch 

CONNECTORS 

STD Bus 56 pin dual readout; 0.125 inch centers 

Audio Output Standard miniature phone jack 

ORDERING INFORMATION 

ITEM _ DESCRIPTION _ 

STD-WORDS STD Bus Phoneme Speech Synthesizer Board 

STM-WORDS Technical Manual for STD-WORDS board 


VOTRAX is a registered trademark of Federal Screw Works 
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